324 ◾ Bioinformatics
mkdir checkM_out
mkdir checkM_out/healthy
checkm lineage_wf \
-t 4 \
-x fa \
-f checkM_out/healthy_checkm_report.txt \
binning/healthy \
checkM_out/healthy
mkdir checkM_out/moderate
checkm lineage_wf \
-t 4 \
-x fa \
-f checkM_out/moderate_checkm_report.txt \
binning/moderate \
checkM_out/moderate
mkdir checkM_out/severe
checkm lineage_wf \
-t 4 \
-x fa \
-f checkM_out/severe_checkm_report.txt \
binning/severe \
checkM_out/severe
The report (Figure 8.6) shows the bin ID, marker lineage (taxonomic rank), # genome
(number of genomes used to infer marker sets), # marker (number of marker genes), #
marker set (number of sets within the inferred markers), 0–5+ (number of times each
marker gene is identified), completeness (presence/absence of marker genes), and strain
heterogeneity (high heterogeneity indicates the contamination is from one or more closely
related organisms).
In Figure 8.6, for the moderate sample, we can notice that for the bin “moderate.4”, there
are 5449 bacterial genomes that were used to infer 104 markers genes; only 6 genes were
inferred in the bin, the completeness is 10.34, and there was no contamination. For more
details about the use of CheckM and report, refer to the program home page at “https://
ecogenomics.github.io/CheckM/”.
8.2.9 Prediction of Protein-Coding Region
This step is to annotate the single genomes recovered by binning or metagenomic assem-
blies with potential gene locations by predicting the open reading frames (ORFs). The gene
FIGURE 8.6 Genome completeness evaluation report generated by CheckM.